Constructing Chinese-English Concept Space
نویسندگان
چکیده
The information available in languages other than English on the World Wide Web is increasing significantly. According to a report from Computer Economics [1], 54% of Internet users are English speaking. However, it is predicted that there will be only 60% increase in Internet users among English speakers but there will be 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by Techserver [2] shows that the number of Internet servers in China has increased from 505,000 to 1.175 million from January to June of 1999. All of these evidences reveal the importance of cross-lingual research to satisfy the needs in the near future. Cross-lingual information retrieval has been one of the focuses of digital library research recently. The NSF/DARPA/ANSA funded Digital Library Initiative 2 in U.S. is actively promoting activities and processes that cross the boundaries of language and politics. A new program in International Digital Libraries Collaborative Research is initiated [23]. The Internet 2 project will also be multilingual. In the Digital Library Initiative 1, researchers have done extensive work on structural interoperability and semantic interoperability. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored. However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we'll focus on cross-lingual semantic interoperability. In this paper, we present the construction of the Chinese-English cross-lingual concept space by Hopfield network based on a parallel corpus. Such concept space is important for building an accurate cross-lingual information retrieval system. Experiments are conducted to mesure the precision of the constructed concept space and the performance of the algorithm.
منابع مشابه
Chinese-English Parallel Corpus Construction and its Application
Chinese-English parallel corpora are key resources for Chinese-English cross-language information processing, Chinese-English bilingual lexicography, Chinese-English language research and teaching. But so far large-scale Chinese-English corpus is still unavailable yet, given the difficulties and the intensive labours required. In this paper, our work towards building a large-scale Chinese-Engli...
متن کاملAutomatic English-Chinese Name Transliteration for Development of Multilingual Resources
In this paper, we describe issues in the translation of proper names from English to Chinese which we have faced in constructing a system for multilingual text generation supporting both languages. We introduce an algorithm for mapping from English names to Chinese characters based on (1) heuristics about relationships between English spelling and pronunciation, and (2) consistent relationships...
متن کاملBiFrameNet: Bilingual Frame Semantics Resource Construction by Cross-lingual Induction
We present a novel automatic approach to constructing a bilingual semantic network—the BiFrameNet, to enhance statistical and transfer-based machine translation systems. BiFrameNet is a frame semantic representation, and contains semantic structure transfers between English and Chinese. The English FrameNet and the Chinese HowNet provide us with two different views of the semantic distribution ...
متن کاملConstructing of a Large-Scale Chinese-English Parallel Corpus
This paper describes the constructing of a large-scale (above 500,000 pair sentences) Chinese-English parallel corpus. The current status of Chinese corpora is overviewed with the emphasis on parallel corpus. The XML coding principles for Chinese–English parallel corpus are discussed. The sentence alignment algorithm used in this project is described with a computer-aided checking processing. F...
متن کاملGenerating Cross-lingual Concept Space from Parallel Corpora on the Web
The information available in languages other than English on the World Wide Web is increasing significantly. To cross language boundaries between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in genre and domain and it is impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000